Dynamic facial feature substitution for video conferencing

ABSTRACT

In an approach to determine facial feature substitution in a video conference, a computer receives one or more pre-recorded videos of an attendee of a video conference. The computer then substitutes one or more portions of the one or more pre-recorded videos into an avatar, the substitution corresponding to at least one targeted facial feature of the attendee in the video conference.

BACKGROUND OF THE INVENTION

The present invention relates generally to the field of video and webconferencing, and more particularly to providing dynamic facial featuresubstitution in an avatar for a video conference.

Global business meetings commonly occur by video conference, connectingpeople across multiple continents and time zones. Video conferencesenable participants to share video and audio content with each other ina computing environment across multiple continents. A communicationdevice at each location with video and audio capability, such as a videocamera or more commonly, a tablet, a laptop, a smart phone or a similardevice utilizing a video conference platform, program or application maybe used for video conference meetings. Video conferences provideattendees with the ability to interact and more clearly communicateusing visual and verbal communication cues. Attendees may use facialexpressions to aide verbal communication and, through face to facecommunication, develop relationships that aide in business endeavors andteam building.

SUMMARY

An embodiment of the present invention discloses a method, a computerprogram product, and a computer system for determining facial featuresubstitution in a video conference. A computer receives one or morepre-recorded videos of an attendee of a video conference. The computerthen substitutes one or more portions of the one or more pre-recordedvideos into an avatar, the substitution corresponding to at least onetargeted facial feature of the attendee in the video conference.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional block diagram illustrating a distributed dataprocessing environment for a dynamic facial feature substitutionprogram, in accordance with an embodiment of the present invention;

FIG. 2 is a flowchart depicting operational steps of a dynamic facialfeature substitution program, on a computer within the data processingenvironment of FIG. 1, for use with an avatar in a video conference, inaccordance with an embodiment of the present invention;

FIG. 3 is a diagram depicting an example of portions of pre-recordedvideos used for dynamic facial feature substitution by the dynamicfacial feature substitution program of FIG. 2, in accordance with anembodiment of the present invention; and

FIG. 4 is a block diagram of components of the computer in FIG. 1executing the dynamic facial feature substitution program, in accordancewith an embodiment of the present invention.

DETAILED DESCRIPTION

Video conferences provide a cost effective method to allow virtual faceto face meetings with global attendees. Video conferences may used toaid in effective team building, problem solving and status meetingswhere attendees can use both verbal and visual communication modes.Embodiments of the present invention recognize that not all videoconferences occur during regular work hours and while, ideally meetingattendees should present a business appropriate image or appearance,sometimes this is difficult to do. In addition, some attendee's role mayonly require a listen only need for status updates which may allow themto perform other tasks, e.g. check mail, messages or watch a stockreport, during the meeting.

Embodiments of the present invention provide a video conference attendeewith the capability to attend a video conference without business attireor without appropriate grooming such as shaving or applying make-up. Themeeting attendee may use a pre-recorded video or an avatar to be used inplace of a real-time video feed of the meeting attendee. Embodiments ofthe present invention provide the avatar, which can mimic or mirror thereal-time motions or facial expressions of the attendee, by using facialrecognition to substitute targeted facial features in the avatar withportions of pre-recorded video corresponding to the facial expressionsof the attendee in the video conference.

Dynamic facial feature substitution of targeted facial features (e.g.the attendee's eye area, eye brow area, nose area and mouth area) in anavatar with pre-recorded video of the attendee's targeted facialfeatures that closely matches the attendee's facial features in areal-time video feed provides an animated avatar capable of mimickingthe attendee's facial movements. In embodiments of the presentinvention, dynamic facial feature substitution occurs by substitutingportions of extracted video of facial areas of the pre-recorded video(e.g. the mouth or eye area) using facial recognition software tocorrelate or match the real-time video of the attendee, in particular,the attendee's facial features and movements, to a similar pre-recordedvideo exhibiting the same or similar facial expressions, facialmovements or articulations. The pre-recorded video records the attendeepresented appropriately for a video conference, for example, dressed inappropriate business attire, shaved, hair combed, and presenting theirbest or desired appearance.

The present invention will now be described in detail with reference tothe Figures. FIG. 1 is a functional block diagram illustrating adistributed data processing environment, generally designated 100, inaccordance with one embodiment of the present invention. Distributeddata processing environment 100, includes network 110, server 120, videodisplay 125, video camera 130 and computer 140, in accordance with anexemplary embodiment of the present invention.

In the exemplary embodiment, network 110 is the Internet representing aworldwide collection of networks and gateways that use TCP/IP protocolsto communicate with one another. Network 110 may include any number ofcables, routers, switches and/or firewalls. Server 120, video display125, video camera 130 and computer 140 are interconnected by network110. Network 110 can be any combination of connections and protocolscapable of supporting communications between server 120, video display125, video camera 130 and computer 140, including communication withdynamic facial feature substitution program 150. In other embodiments,network 110 may also be implemented as a number of different types ofnetworks, such as an intranet, a local area network (LAN), a virtuallocal area network (VLAN), a wide area network (WAN), or any combinationof a number of different types. FIG. 1 is intended as an example, andnot as an architectural limitation for the different embodiments.

In the exemplary embodiment, server 120 may be, for example, a webserver, a server computer such as a management server, or any otherelectronic device, computing device, or computing system capable ofsending and receiving data. In another embodiment, server 120 representsa “cloud” of computers interconnected by one or more networks, whereserver 120 is a computing system utilizing clustered computers andcomponents to act as a single pool of seamless resources when accessedthrough network 110. Server 120 includes video conference feedcontroller 122 which receives video from computer 140 which may bedisplayed on video display 125. Server 120 can connect to a videoconference using video conference feed controller 122 and send video,such as a video feed of the video conference to computer 140 via network110.

Computer 140 includes display 145, dynamic facial feature substitutionprogram 150, and video storage 155. In the exemplary embodiment,computer 140 is a client to server 120. Computer 140 may be a notebook,a laptop, a smartphone, a personal digital assistant (PDA), a tabletcomputer, a desktop computer, a wearable computing device or any othercomputing device or system capable of communicating with server 120through network 110. In the exemplary embodiment, computer 140 receivesand sends video recorded from video camera 130 which is stored in videostorage 155. In another embodiment, computer 140 may provide the abilityto record video, and to send and to receive video such as may beaccomplished with a smartphone or a tablet computer with videocapability. In the exemplary embodiment, computer 140 receives one ormore video feeds for a video conference or web conference ascoordinated, integrated and received from video conference feedcontroller 122 via network 110 and shown on display 145. Display 145,which may also be a user interface, displays to a user video feed from avideo conference. In the exemplary embodiment, computer 140 may sendvideo which may be a real-time video from video camera 130 or apre-recorded video retrieved from video storage 155 to video conferencefeed controller 122 for viewing on video display 125 by other attendeesin the video conference. In one embodiment, computer 140 may not be aclient device to server 120 but, may be connected via network 110 withone or more computing devices such as smart phones, laptops, wearablecomputing devices or notebooks each of which have video conferenceapplications and video capability. In another embodiment, dynamic facialfeature substitution program 150 is partially or fully integrated onserver 120, or a remote “cloud” server such as a computer or a group ofcomputing machines connected by a LAN or WAN. Computer 140 may includeinternal and external hardware components, as depicted and described infurther detail with respect to FIG. 4.

In the exemplary embodiment, dynamic facial feature substitution program150 on computer 140 utilizes facial recognition software to correlate ormatch a real-time video feed of an attendee's facial expression, facialmovements and articulations to pre-recorded video of the attendee. In anembodiment, dynamic facial feature substitution program 150 may matchportions of the pre-recorded video stored in video storage 155 totargeted facial features in the real-time video of the attendee in thevideo conference. In the exemplary embodiment, dynamic facial featuresubstitution program 150 receives video from video camera 130 and videoconference feeds from video conference feed controller 122 for analysisand sends video to video conference feed controller 122 on server 120via network 110 for display in the video conference. In otherembodiments, dynamic facial feature substitution program 150 may receiveand send one or more video feeds from other computing devices such assmart phones or wearable computing devices via network 110.

Dynamic facial feature substitution program 150 receives one or morepre-recorded videos of the attendee recorded on video camera 130 invarious poses, exhibiting common facial expressions (e.g. smiling,neutral, attentively listening, frowning or laughing), and speakingnumerous words and phrases commonly used in a video conference. Dynamicfacial feature substitution program 150 stores the pre-recorded videosfrom video camera 130 in video storage 155. Receiving a request from avideo conference attendee for an avatar with dynamic facial featuresubstitution, dynamic facial feature substitution program 150 selects anavatar by matching a pre-recorded video with a pose or a facialexpression to the attendee's pose or facial expression in a real-timevideo feed or live video of the attendee. Dynamic facial featuresubstitution program 150 matches the pre-recorded video facial featureswith the real-time video facial features using facial recognition, whichmay be augmented with shape recognition algorithms to match theattendees body positioning or pose. The real-time video feed is asubstantially “real-time” video feed or a near real-time video feedwhich may have some delay due to data transmission (i.e. cables, wires,networks, etc). Dynamic facial feature substitution program 150 mayretrieve the pre-recorded video or avatar from video storage 155 andinsert the avatar into the video feed sent to video conference feedcontroller 122 for the video conference. The avatar substitutes thereal-time video feed of the attendee in the video conference anddisplays to other attendees of the video conference via video display125.

Dynamic facial feature substitution program 150 provides dynamic facialfeature substitution in which the avatar mimics the attendee's facialfeature movements. Using facial recognition software to match theattendee's facial feature expressions, articulations and facial motionsin the live or real-time video feed to the facial feature expressions,articulations and motions of targeted facial features in a pre-recordedvideo, dynamic facial feature substitution program 150 extracts theportions of the pre-recorded video corresponding to the attendee'stargeted facial features. Using a co-ordinate map of one or more keyfacial elements, for example, a corner of the eyes, a center of thenose, and a corner of the mouth determined by dynamic facial featuresubstitution program 150, the program inserts the portions of thepre-recorded video into the avatar at corresponding points on theco-ordinate map. The portions of the pre-recorded video matching theattendee's facial expressions, words, or facial movements in thereal-time video are substituted for the targeted facial features in theavatar. The avatar with dynamic facial feature substitution may betransmitted over network 110 to video conference feed controller 122 onserver 120 for use in place of a real-time video feed of the attendee torepresent via video display 125 the requesting attendee to the othervideo conference attendees viewing or attending the video conference,either within a conference room or single location, or remotely vianetwork 110. Dynamic facial feature substitution program 150 providesthe attendee with the capability to use an avatar mirroring theattendee's targeted facial features thus, providing a fully pre-recordedvideo of the attendee capable of mimicking the real-time reactions,articulations and facial movements of the attendee in the videoconference.

Video storage 155 included on computer 140 stores videos and portions ofvideo recorded by video camera 130 or a similar recording device capableof recording and sending video to computer 140. In an embodiment, videostorage 155 receives portions of the pre-recorded videos from dynamicfacial feature substitution program 150. Dynamic facial featuresubstitution program 150 may identify the portions by the targetedfacial features in the portion, such as eye area, eyebrow area, nosearea and mouth area, and may be further identified by facial expression,facial movements, sentiment exhibited or words spoken. In the exemplaryembodiment of the present invention, dynamic facial feature substitutionprogram 150 retrieves pre-recorded videos stored in video storage 155for use as an avatar in the video conference. Dynamic facial featuresubstitution program 150 retrieves from video storage 155 one or more ofthe portions of the pre-recorded videos of the targeted facial featuresfor use in facial feature substitution in the avatar to mimic or matchfacial expressions and articulations of the attendee in the real-timevideo feed in an embodiment. While depicted on computer 140 in theexemplary embodiment, video storage 155 may be included on a remoteserver, a web server, a “cloud” of computers interconnect by one or morenetworks utilizing clustered computers and components to act as a singlepool of seamless resources, accessible by dynamic facial featuresubstitution program 150 via network 110.

FIG. 2 is a flowchart depicting operational steps of dynamic facialfeature substitution program 150, on a computer within data processingenvironment 100, for use with an avatar in a video conference, inaccordance with an embodiment of the present invention.

In step 202, dynamic facial feature substitution program 150 receivesone or more pre-recorded videos of the attendee. A video conferenceattendee desiring an avatar for use with dynamic facial featuresubstitution program 150 in a video conference, pre-records videos tocapture the attendee's articulations and facial expressions. For thepre-recorded videos, the attendee presents the desired video conferenceappearance. For example, the groomed attendee wears business appropriatedress for a business video conference. The videos pre-recorded capturethe attendee in various poses, with various expressions, and readingnumerous phrases and words. Dynamic facial feature substitution program150 may provide the numerous phrases and words spoken by the attendee tocapture most commonly used words and expressions in similar videoconferences (e.g. business video conferences, technical videoconferences, artistic or music video conferences). The pre-recordedvideos capture the attendee's articulations, facial movements and facialexpressions exhibited for the various spoken words and phrases such ascolloquial verbal phrases and words commonly used in video conferences.For example, a pre-recorded video of an attendee introducing himself mayinclude the appropriate phrase and a smile. The pre-recorded videos canshow various facial expressions exhibiting one or more variousreactions, emotions or emotional states such as happy (e.g. smiling),frustrated, neutral (e.g. relaxed facial features) or amused (e.g.laughing). The videos exhibit a range of emotions or various levels ofan emotion, for example, slightly happy with a slight, closed lip smileor very happy with a large open lip smile. In one embodiment, thepre-recorded videos are received with descriptive filenames relating thevideo content. For example, one video with a filename “Introduction”includes the pre-recorded video of the attendee introducing himself tothe video conference. In an embodiment of the present invention, dynamicfacial feature substitution program 150 may create a co-ordinate map ofthe attendee's key or targeted facial features, such as the corners ofthe eyes, edges of the eyebrows, nose or edges of the mouth for each ofthe pre-recorded videos. The co-ordinate map may track and map theattendee's facial movements.

In another embodiment, a single video recording of the user may captureall of the poses, spoken words, phrases and emotions anticipated to beexhibited in a video conference. Dynamic facial feature substitutionprogram 150 can identify sections or segments of the video using anindex of the video or markers within the video to identify desired videodepicting facial movements for various words, phrases, reactions oremotions. For example, a video may have recorded minutes one to two ofthe attendee introducing themselves and recording minutes three to fourof the attendee saying a commonly used phrase such as “I agree but, havewe considered other options that may save cost?”

In one embodiment, dynamic facial feature substitution program 150 mayextract portions of the pre-recorded video corresponding to targetedfacial features, for example, the eye area, the eye brow area or themouth area and store the individual portions of the pre-recorded videoof the attendee. The individual portions of pre-recorded video stored invideo storage 155 may be retrieved for use in an avatar. Dynamic facialfeature substitution program 150 may create a co-ordinate map of theouter edges of the portions of the pre-recorded video (e.g. the boxescontaining of the video chunks or portions as depicted in FIG. 3).

In the exemplary embodiment, video camera 130 records the individualvideos or video segments and sends the videos to computer 140 forstorage in video storage 155. In one embodiment, computer 140, which maybe, for example, a smartphone, a wearable computing device or a tablet,records the videos and stores the videos in video storage 155. Inanother embodiment, the one or more pre-recorded videos may be stored asfiles resident on the computer or in computer memory such as persistentstorage 408 in FIG. 4.

In step 204, dynamic facial feature substitution program 150 receives arequest for an avatar with dynamic facial feature substitution. Dynamicfacial feature substitution program 150 receives, from a user via a userinterface on display 145, an attendee generated request to use an avatarwith dynamic facial feature substitution in place of a real-time videofeed of the attendee in a video conference.

In step 206, dynamic facial feature substitution program 150 determinesan avatar for a video conference. Dynamic facial feature substitutionprogram 150 may determine the avatar or pre-recorded video of theattendee to be used for beginning the video conference in one of severalways. In one embodiment of the present invention, dynamic facial featuresubstitution program 150 correlates or matches the attendees pose in thereal-time video recording just prior to initiating the avatar to a posein a video of the various pre-recorded videos. The real-time videorecording may be initiated automatically by dynamic facial featuresubstitution program 150 when a video conference attendee requests anavatar in embodiments of the present invention where video camera 130 isa web camera, for example, connected to computer 140 or integrated intocomputer 140. In other embodiments where video camera 130 is connectedto computer 140 via cables or other connections, the real-time video maybe started manually by the attendee. Dynamic facial feature substitutionprogram 150 may utilize shape recognition software on key body points orbody shape of a pose to determine a pre-recorded video with a similarpose to the real-time video of the attendee. In an embodiment, facialrecognition software may be used to match a real-time video feed of theattendee's face to pre-recorded video. The pre-recorded videocorresponding to the attendee's facial expression in the initialreal-time video feed at the time of the request for a dynamic facialsubstitution may be used as the avatar. The request for an avatar usingdynamic facial feature substitution may occur at the start of the videoconference or at any time in the video conference. When the request foran avatar is received during the video conference, the avatar may beselected or determined by one or more of the facial recognition, shaperecognition, natural language processing, speech recognition, sentimentanalysis of the attendee or the meeting discussions.

In another embodiment, a pre-recorded video for initiating the videoconference may be a pre-set or default selection for the initially usedpre-recorded video or avatar in a video conference. A pre-recordedintroduction video may, for example, show the attendee with a slightsmile, a nod, or introducing themselves by name. In a differentembodiment, dynamic facial feature substitution program 150 maydetermine the initial avatar or pre-recorded video for the start of thevideo conference by randomly selecting one of several pre-recordedvideos of the attendee in a neutral or relaxed pose. In one embodiment,dynamic facial feature substitution program 150 may receive from a userinterface on display 145 a user or attendee selected avatar to start thevideo conference. In yet another embodiment, dynamic facial featuresubstitution program 150 may determine a pre-recorded video or avatar touse initially based on a filename of the video, for example, a videolabeled “introduction” in video storage 155.

In step 208, dynamic facial feature substitution program 150 retrievesand inserts the avatar. Dynamic facial feature substitution program 150retrieves from video storage 155 the avatar or pre-recorded videodetermined for initial use in place of the real-time video feed of theattendee. Dynamic facial feature substitution program 150 inserts theavatar in place of the real-time video feed of the attendee in the videoconference via network 110 and video conference feed controller 122.

In step 210, dynamic facial feature substitution program 150 determinesportions of the pre-recorded videos for targeted facial featuresubstitution in the avatar. Using known facial recognition methods,facial analysis software, and feature recognition software andalgorithms, dynamic facial feature substitution program 150 isolates andanalyzes targeted facial features in the real-time video feed of theattendee, for example, an eye area or a mouth area, and correspondingfacial movements and expressions of the targeted facial features, as theattendee speaks or moves. In an embodiment, the co-ordinate map createdby dynamic facial feature substitution program 150 may track or map thefacial movements of the attendee. In the exemplary embodiment, dynamicfacial feature substitution program 150 determines the key or targetedfeatures to be mapped.

Dynamic facial feature substitution program 150 determines the portionsfor targeted facial feature substitution by correlating or matching thetargeted facial features and the movements of the targeted facialfeatures in the real-time video feed of the attendee to one or morestored portions of the pre-recorded videos of the attendee using facialrecognition. A portion of a pre-recorded video is a portion or adiscreet piece (e.g. “chunk”) of the pre-recorded videos that includes atargeted facial feature, for example, the attendee's eye area, which maybe extracted from the pre-recorded video of the attendee. The portionsof the targeted facial features include, for example, an eye area, aneye brow area, a nose area and a mouth area as illustrated later withreference to FIG. 3. While discussed as the eye area, eye brow area,nose area and mouth area, the portions of the pre-recorded video of thetargeted facial features should not be limited to these areas but, maybe a subset of these areas or may include different or larger areas suchas the whole face or other parts of the face.

Dynamic facial feature substitution program 150 isolates the targetedfacial features and creates one or more portions of the pre-recordedvideos of the attendee speaking or exhibiting facial expressions such aslaughing that may be inserted into an avatar or pre-recorded video ofthe attendee. Dynamic facial feature substitution program 150 insertsthe portions of the pre-recorded video which match or correspond to thereal-time facial expressions, articulations and movements to mimic ormatch the attendee's real-time facial expressions or articulations, ascaptured in the real-time video from video camera 130. In oneembodiment, the portions of the pre-recorded video for the targetedfacial features may be extracted from the pre-recorded video and storedin video storage 155 for re-use.

In the exemplary embodiment, dynamic facial feature substitution program150 determines one or more targeted facial features in the real-timevideo feed used for creating the portions of the pre-recorded video,such as the eye area, eye brow area and mouth area. In one embodiment,the attendee may specify the targeted facial features or target areasfor creating portions of the pre-recorded videos for substitution inpre-recorded video by highlighting the desired area or areas on one ofthe pre-recorded videos or a still image extracted from the pre-recordedvideo. In an embodiment, the attendee may select to create portions ofthe pre-recorded video which include the whole face.

In the exemplary embodiment, dynamic facial feature substitution program150 matches or correlates the targeted facial features in the real-timevideo feed of the attendee using facial recognition to a correspondingpre-recorded video of the attendee and extracts the portions of thepre-recorded video corresponding to the targeted facial features. Inanother embodiment of the invention, the one or more portions of thepre-recorded videos stored in video storage 155 may be correlated ormatched to the targeted facial features in the real-time video feed ofthe attendee by, for example, a video filename or a video marker name.For example, a real-time video feed of the attendee's mouth area with aslight frown may be matched to a video portion named, “mouth_frown1”. Inanother embodiment, dynamic facial feature substitution program 150 maycorrelate the pre-recorded video to the real-time video feed of theattendee's whole body using both facial recognition and shaperecognition.

Dynamic facial feature substitution program 150 selects or determinesthe portions of the pre-recorded video that match or mimic theattendee's real-time facial expressions and articulations in thereal-time video feed by, for example, correlating the facial expressionsand articulations of the real-time video feed to the pre-recorded videoby analyzing key features and targeted facial features. When theselected or determined portions of the pre-recorded video (e.g. videochunks or extracted portions or parts of the pre-recorded video of thetargeted facial features such as the eye area or the month area) aresubstituted into an avatar using dynamic facial feature substitution, acompletely pre-recorded avatar mimicking or mirroring the actions of the“real-time” attendee is created. The avatar with dynamic facial featuresubstitution mirrors the real-time actions and facial feature movementsof the attendee without including any real-time video of the attendee.Dynamic facial feature substitution program 150 is capable of creatingan avatar depicting real-time facial features, articulations, and facialexpressions of a video conference attendee without using real-time videoof the attendee who, for example, may not have time to shave before themeeting.

In one embodiment of the present invention, a three dimensional facerecognition system which uses three dimensional facial recognitionalgorithms and techniques may be used on the real-time video feed usingknown methods to create a three dimensional video capture such asprojecting a grid on the attendees face and integrating the videocapture into a three dimensional model. In this embodiment, a similarthree dimensional video technique may be applied to the pre-recordedvideos of the attendee. Three dimensional facial recognition algorithmsmay be applied to match corresponding pre-recorded video with thetargeted facial features to a real-time video feed of the attendee. Athree dimensional facial recognition system may provide another accuratecorrelation of facial features.

In another embodiment, dynamic facial feature substitution program 150may use natural language processing and speech analysis to correlatewords or various phrases to the pre-recorded words or phrases in theportions of the pre-recorded videos. In this embodiment, dynamic facialfeature substitution program 150 analyzes the real-time audio feed ofthe attendee and determines one or more portions of pre-recorded videosto use in the avatar to simulate the attendee's spoken word andarticulation by correlating the words and phrases using speech analysisto the associated or matching words and phrases in the pre-recordedvideo or the portions of the pre-recorded video. The analysis of theattendee's spoken words or verbally expressed emotions such as laughtermay be used when the attendee does not have access to a video recordingdevice or does not wish to use a real-time video recording but, stilldesires the use of an avatar with facial feature substitution to providea real-time video visually depicting the attendee's facial featuremotions or articulations for the video conference.

In a further embodiment, for situations when the user is in listen-onlymode for a video conference or does not have access to a video recordingdevice, dynamic facial feature substitution program 150 can use naturallanguage processing and sentiment analysis of the discussions in thevideo conference to determine the meeting tone or sentiment andcorrelates the meeting tone to corresponding portions of thepre-recorded videos using facial analysis of the pre-recorded videos.For example, when meeting members share a joke and laugh, dynamic facialfeature substitution program 150 may use facial recognition software tocorrelate a jovial meeting tone (e.g. a meeting with laughter) with apre-recorded video of the attendee laughing. Dynamic facial featuresubstitution program 150 may retrieve the portions of targeted facialfeatures of the pre-recorded videos correlated to the jovial meetingtone to insert into the avatar.

In step 212, dynamic facial feature substitution program 150 retrievesportions of the pre-recorded videos. Dynamic facial feature substitutionprogram 150 retrieves from video storage 155 the one or more portions ofthe pre-recorded videos of the targeted facial features (i.e. eye area,eye brow area, or mouth area). In the exemplary embodiment, theretrieved portions of the pre-recorded video include a co-ordinate mapof the key facial elements and the facial movements. In an embodiment,the facial recognition analysis of the real-time video feed may becorrelated to the portions of the pre-recorded videos and retrieved byone of the following: a video filename, markers in a pre-recorded video,an index name or numbers in a pre-recorded video.

In step 214, dynamic facial feature substitution program 150 substitutesthe portions of the pre-recorded videos into the avatar. Subsequent toretrieving the one or more pre-recorded portions of the pre-recordedvideos corresponding to the attendee's real-time facial feature,movement or expression, dynamic facial feature substitution program 150inserts the portions of the pre-recorded videos into the avatar. Theavatar, with inserted portions of the pre-recorded video, is sent bydynamic facial feature substitution program 150 via network 110 to videoconference feed controller 122. Video conference feed controller 122 maydisplay the avatar on video display 125 for the remote video conferenceattendees and send video feed with the avatar to other video conferencelocations. Dynamic facial feature substitution program 150 utilizes theco-ordinate maps of key facial elements and facial movements created forthe portions of the pre-recorded videos and the avatar (e.g.pre-recorded video). By matching the co-ordinate maps of the key facialelements for the portions of the pre-recorded videos and the avatar, theportions of the pre-recorded videos may be inserted into the avatar forthe video conference. In various embodiments, known digital blending orsmoothing techniques may be applied by dynamic facial featuresubstitution program 150 to create a seamless video of the avatar forinclusion in place of the real-time video feed of the attendee.

In step 216, dynamic facial feature substitution program 150 monitorsreal-time video feed. Dynamic facial feature substitution program 150monitors the real-time video feed of the attendee for changes in pose,facial expressions, facial movements or articulations using facialrecognition.

In step 218, dynamic facial feature substitution program 150 determinesif the facial features change in the real-time video feed. Dynamicfacial feature substitution program 150 monitors the real-time video ofthe attendee and using facial recognition algorithms, determines ifchanges to the facial features in the real-time video feed of theattendee occur such as a change in facial expression or a change inarticulations (i.e. new words or phrases). When dynamic facial featuresubstitution program 150 determines that there is a change in the facialfeatures of the attendee in the real-time video (“yes” branch, decisionblock 218), the program returns to step 210 to determine the one or moreportions of the pre-recorded video to be substituted into the avatar inthe video conference for the changed facial features.

If dynamic facial feature substitution program 150 determines there isno change in the facial features of the attendee (“no” branch, decisionblock 218), then the program, in step 220, determines if the attendeeexits the program. In the exemplary embodiment, dynamic facial featuresubstitution program 150 utilizes facial recognition software toidentify that there are no changes to the attendee's facial features inthe real-time video feed and the portions of the pre-recorded videoremain the same. In an embodiment of the present invention, when thereis no change in the facial expression or articulations in the real-timevideo feed of the attendee, for a period of time determined by thesystem, for example four minutes, dynamic facial feature substitutionprogram 150 may randomly insert a facial movement or another video of asimilar but, slightly different facial expression. In anotherembodiment, dynamic facial feature substitution program 150 may randomlyinsert a movement and facial expression consistent with the meetingdiscussion sentiment as determined by natural language processing andsentiment analysis. In an embodiment, when there is no change in thefacial features, dynamic facial feature substitution program 150determines if the attendee exits the program, for example, to use areal-time feed in the video conference in place of the avatar or becausethe video conference has ended. If the attendee has exited the program(“yes” branch, decision block 220), the program ends processing. Dynamicfacial feature substitution program 150 may be initiated, re-initiatedor terminated at any time in the video conference.

If in step 220, dynamic facial feature substitution program 150determines that the attendee does exit the program, the avatar is nolonger sent to video conference feed controller 122 and the programends. In another embodiment, the attendee may select to exit dynamicfacial feature substitution program 150 by clicking an icon, tab orusing another command to exit the program at any time in the videoconference independent of the attendee's facial feature changes.

If in step 220, dynamic facial feature substitution program 150determines that the attendee does not exit the program, then dynamicfacial feature substitution program 150 proceeds back to step 216 andcontinues monitoring the real-time video feed (“no” branch, decisionblock 220).

FIG. 3 is a diagram, generally designated 300, depicting an example ofportions of targeted facial features in the pre-recorded video used fordynamic facial feature substitution by a dynamic facial featuresubstitution program, in accordance with an embodiment of the presentinvention. Face 301 depicts targeted facial features including the eyearea, the eye brow area, the nose area and the month area. The boxesaround the targeted facial features indicate an example of determinedone or more portions of the pre-recorded videos. Boxes 311 depict aportion of the pre-recorded video that is extracted or determined bydynamic facial feature substitution program 150 for the eye brow area.Similarly, boxes 312 depict the portion of the pre-recorded video thatis extracted or determined by dynamic facial feature substitutionprogram 150 for the eye area. Box 313 depicts the portion of thepre-recorded videos extracted or determined for the nose area and box314 depicts the portion of the pre-recorded videos extracted ordetermined for the mouth area. The pre-recorded portions of the videos(e.g. portions or chunks of the pre-recorded video from the targetedfacial features such as the areas depicted by boxes 311, boxes 312, box313 and box 314) may be substituted in place of the correspondingportions of the face or facial features in the avatar based on a facialrecognition analysis of the real-time video feed of the attendee. Theseportions or chunks of the pre-recorded video may be replacedindependently as determined by dynamic facial feature substitutionprogram 150 over a pre-recorded video or a cycling video used as anavatar of the attendee.

While depicted in FIG. 3 as the eye area, eye brow area, nose area andmouth area, the portions of the pre-recorded video (e.g. video portionsof the targeted facial features) should not be limited to the depictedareas but, may be a subset of the illustrated areas or may includedifferent or larger areas such as the whole face or other parts of theface.

FIG. 4 depicts a block diagram of components of computer 140 inaccordance with an illustrative embodiment of the present invention. Itshould be appreciated that FIG. 4 provides only an illustration of oneimplementation and does not imply any limitations with regard to theenvironments in which different embodiments may be implemented. Manymodifications to the depicted environment may be made.

Computer 140 includes communications fabric 402, which providescommunications between computer processor(s) 404, memory 406, persistentstorage 408, communications unit 410, and input/output (I/O)interface(s) 412. Communications fabric 402 can be implemented with anyarchitecture designed for passing data and/or control informationbetween processors (such as microprocessors, communications and networkprocessors, etc.), system memory, peripheral devices, and any otherhardware components within a system. For example, communications fabric402 can be implemented with one or more buses.

Memory 406 and persistent storage 408 are computer readable storagemedia. In this embodiment, memory 406 includes random access memory(RAM) 414 and cache memory 416. In general, memory 406 can include anysuitable volatile or non-volatile computer readable storage media.

Dynamic facial feature substitution program 150 can be stored inpersistent storage 408 for execution by one or more of the respectivecomputer processors 404 via one or more memories of memory 406. In thisembodiment, persistent storage 408 includes a magnetic hard disk drive.Alternatively, or in addition to a magnetic hard disk drive, persistentstorage 408 can include a solid state hard drive, a semiconductorstorage device, read-only memory (ROM), erasable programmable read-onlymemory (EPROM), flash memory, or any other computer readable storagemedia that is capable of storing program instructions or digitalinformation.

The media used by persistent storage 408 may also be removable. Forexample, a removable hard drive may be used for persistent storage 408.Other examples include optical and magnetic disks, thumb drives, andsmart cards that are inserted into a drive for transfer onto anothercomputer readable storage medium that is also part of persistent storage408.

Communications unit 410, in these examples, provides for communicationswith other data processing systems or devices, including resources ofdata processing environment 100 and computer 140 and server 120. Inthese examples, communications unit 410 includes one or more networkinterface cards. Communications unit 410 may provide communicationsthrough the use of either or both physical and wireless communicationslinks. Dynamic facial feature substitution program 150 may be downloadedto persistent storage 408 through communications unit 410.

I/O interface(s) 412 allows for input and output of data with otherdevices that may be connected to computer 140. For example, I/Ointerface 412 may provide a connection to external devices 418 such as akeyboard, a keypad, a touch screen, and/or some other suitable inputdevice. External device(s) 418 can also include portable computerreadable storage media such as, for example, thumb drives, portableoptical or magnetic disks, and memory cards. Software and data used topractice embodiments of the present invention, e.g., dynamic facialfeature substitution program 150, can be stored on such portablecomputer readable storage media and can be loaded onto persistentstorage 408 via I/O interface(s) 412. I/O interface(s) 412 also connectto a display 420.

Display 420 provides a mechanism to display data to a user and may be,for example, a computer monitor.

The programs described herein are identified based upon the applicationfor which they are implemented in a specific embodiment of theinvention. However, it should be appreciated that any particular programnomenclature herein is used merely for convenience, and thus theinvention should not be limited to use solely in any specificapplication identified and/or implied by such nomenclature.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be any tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprise ofcopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, a special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, a segment, or aportion of instructions, which comprises one or more executableinstructions for implementing the specified logical function(s). In somealternative implementations, the functions noted in the block may occurout of the order noted in the Figures. For example, two blocks shown insuccession may, in fact, be executed substantially concurrently, or theblocks may sometimes be executed in the reverse order, depending uponthe functionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

The descriptions of the various embodiments of the present inventionhave been presented for purposes of illustration, but are not intendedto be exhaustive or limited to the embodiments disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope and spirit of the invention.The terminology used herein was chosen to best explain the principles ofthe embodiment, the practical application or technical improvement overtechnologies found in the marketplace, or to enable others of ordinaryskill in the art to understand the embodiments disclosed herein.

What is claimed is:
 1. A method for facial feature substitution in avideo conference, the method comprising: receiving, by one or morecomputing devices, one or more pre-recorded videos of an attendee of avideo conference; and substituting, by one or more computing devices,one or more portions of the one or more pre-recorded videos into anavatar, the substitution corresponding to at least one targeted facialfeature of the attendee.
 2. The method of claim 1, wherein substituting,by one or more computing devices, the one or more portions of the one ormore pre-recorded videos into the avatar further comprise: creating, byone or more computing devices, a co-ordinate map of one or more keyfacial elements of the attendee in the one or more pre-recorded videos;creating, by one or more computing devices, a co-ordinate map of one ormore key facial elements of the attendee in the video of the attendee inthe video conference; matching, by one or more computing devices, theco-ordinate map of the one or more pre-recorded videos to theco-ordinate map of the video of the attendee in the video conference;and substituting, by one or more computing devices, based, at least inpart, on the matched co-ordinate maps, the one or more portions of theone or more pre-recorded videos into the avatar.
 3. The method of claim1, wherein the one or more portions of the one or more pre-recordedvideos of the attendee correspond to the at least one targeted facialfeature of the attendee in the video of the attendee in the videoconference.
 4. The method of claim 1, further comprising determining, byone or more computing devices, the one or more portions of the one ormore pre-recorded videos correspond to the at least one targeted facialfeature of the attendee in the video of the attendee in the videoconference using one or more facial recognition algorithms to correlatefacial expressions and facial movements in the video to the one or morepre-recorded videos.
 5. The method of claim 1, further comprisingdetermining, by one or more computing devices, the one or more portionsof the one or more pre-recorded videos correspond to the at least onetargeted facial feature of the attendee in the video of the attendee inthe video conference using at least one of natural language processingand speech recognition to match corresponding one or more words in thevideo of the attendee to one or more words in the one or morepre-recorded videos.
 6. The method of claim 1, further comprisingdetermining, by one or more computing devices, the one or more portionsof the one or more pre-recorded videos correspond to the at least onetargeted facial feature of the attendee in the video of the attendee inthe video conference using sentiment analysis to correlate the sentimentof the video of the attendee to sentiment of the one or more portions ofthe one or more pre-recorded videos.
 7. The method of claim 1, furthercomprising determining, by one or more computing devices, the one ormore portions of the one or more pre-recorded videos of the attendeecorrespond to at least one targeted facial feature of the attendee inthe video of the attendee in the video conference using threedimensional facial recognition algorithms to correlate the video of theattendee with one or more portions of the one or more pre-recordedvideos.
 8. The method of claim 1, wherein the one or more pre-recordedvideos of the attendee include a pre-recorded video with at least oneof: an attendee speaking or the attendee exhibiting one or more facialexpressions.
 9. A computer program product for facial featuresubstitution in a video conference, the computer program productcomprising: one or more computer readable storage media and programinstructions stored on the one or more computer readable storage media,the program instructions executable by a processor, the programinstructions comprising: program instructions to receive one or morepre-recorded videos of an attendee of a video conference; and programinstructions to substitute one or more portions of the one or morepre-recorded videos into an avatar, the substitution corresponding to atleast one targeted facial feature of the attendee.
 10. The computerprogram product of claim 9, wherein the program instructions tosubstitute the one or more portions of the one or more pre-recordedvideos into the avatar further comprise: program instructions to createa co-ordinate map of one or more key facial elements of the attendee inthe one or more pre-recorded videos; program instructions to create aco-ordinate map of one or more key facial elements of the attendee inthe video of the attendee in the video conference; program instructionsto match the co-ordinate map of the one or more pre-recorded videos tothe co-ordinate map of the video of the attendee in the videoconference; and program instructions to substitute, based, at least inpart, on the matched co-ordinate maps, the one or more portions of theone or more pre-recorded videos into the avatar.
 11. The computerprogram product of claim 9, further comprising program instructions todetermine one or more portions of the one or more pre-recorded videos ofthe attendee correspond to at least one targeted facial feature of theattendee in the video of the attendee in the video conference usingfacial recognition algorithms to correlate facial expressions and facialmovements in the video to the one or more pre-recorded videos.
 12. Thecomputer program product of claim 9, further comprising programinstructions to determine one or more portions of the one or morepre-recorded videos of the attendee correspond to at least one targetedfacial feature of the attendee in the video of the attendee in the videoconference using at least one of natural language processing and speechrecognition to match corresponding one or more words in the video of theattendee in the video conference to one or more words in the one or morepre-recorded videos.
 13. The computer program product of claim 9,further comprising program instructions to determine one or moreportions of the one or more pre-recorded videos of the attendeecorrespond to at least one targeted facial feature of the attendee inthe video of the attendee in the video conference using sentimentanalysis to correlate a sentiment of the video of the attendee in thevideo conference to a sentiment the one or more portions of the one ormore pre-recorded videos.
 14. The computer program product of claim 9,wherein the one or more pre-recorded videos of the attendee include apre-recorded video with at least one of: an attendee speaking or theattendee exhibiting one or more facial expressions.
 15. A computersystem for facial feature substitution in a video conference, thecomputer system comprising: one or more computer processors; one or morecomputer readable storage media; program instructions stored on thecomputer readable storage media for execution by at least one of the oneor more processors, the program instructions comprising: programinstructions to receive one or more pre-recorded videos of an attendeeof a video conference; and program instructions to substitute one ormore portions of the one or more pre-recorded videos into an avatar, thesubstitution corresponding to at least one targeted facial feature ofthe attendee.
 16. The computer system of claim 15, wherein the programinstructions to substitute the one or more portions of the one or morepre-recorded videos into the avatar further comprise: programinstructions to create a co-ordinate map of one or more key facialelements of the attendee in the one or more pre-recorded videos; programinstructions to create a co-ordinate map of one or more key facialelements of the attendee in the video of the attendee in the videoconference; program instructions to match the co-ordinate map of the oneor more pre-recorded videos to the co-ordinate map of the video of theattendee in the video conference; and program instructions tosubstitute, based, at least in part, on the matched co-ordinate maps,the one or more portions of the one or more pre-recorded videos into theavatar.
 17. The computer system of claim 15, further comprising programinstructions to determine one or more portions of the one or morepre-recorded videos of the attendee correspond to at least one targetedfacial feature of the attendee in the video of the attendee in the videoconference using one or more facial recognition algorithms to correlatefacial expressions and facial movements in the video to the one or morepre-recorded videos.
 18. The computer system of claim 15, furthercomprising program instructions to determine one or more portions of theone or more pre-recorded videos of the attendee correspond to at leastone targeted facial feature of the attendee in the video of the attendeein the video conference using at least one of natural languageprocessing and speech recognition to match corresponding one or morewords in the video of the attendee in the video conference to one ormore words in the one or more pre-recorded videos.
 19. The computersystem of claim 15, further comprising program instructions to determineone or more portions of the one or more pre-recorded videos of theattendee correspond to at least one targeted facial feature of theattendee in the video of the attendee in the video conference usingsentiment analysis to correlate a sentiment of the video of the attendeein the video conference to a sentiment the one or more portions of theone or more pre-recorded videos.
 20. The computer system of claim 15,wherein the one or more pre-recorded videos of the attendee include apre-recorded video with at least one of: an attendee speaking or theattendee exhibiting one or more facial expressions.